Welcome everybody to today's deep learning lecture.
Today we want to talk a bit about common practices, the stuff that you need to know to get everything
implemented in practice.
So I have a small outline over the next couple of videos and the topics that we will look
at.
So we will think about the problems that we currently have and how far we went.
We will talk about training strategies, again optimization and learning rates, a couple
of tricks on how to adjust them, architecture selection and hyperparameter optimization.
One trick that is really useful is ensembling.
People typically have to deal with the class imbalance and of course there are very interesting
approaches to deal with this.
So finally we look into the evaluation and how to get a good predictor.
We also estimate how well our network is actually performing.
So far we have seen all the nuts and bolts of how to train the network.
We have seen different layers from fully connected to convolutional ones, the activation functions,
the loss functions, the optimization, regularization and today we will talk about how to choose
the architecture train and evaluate a deep network.
The very first thing is testing.
Ideally the test data should be kept in a vault and be brought out only at the end of
the data analysis as Hasty and colleagues are teaching in the elements of statistical
learning.
So first things first, overfitting is extremely easy with neural networks.
Remember the ImageNet random labels?
The true test set error and generalization can be underestimated substantially when you
use the test set for model selection.
So when we choose the architecture, typically the first element in the model selection,
this should never be done on the test set.
We can do initial experimentation on a smaller subset of the data and try to figure out what
works.
Never work on the test set when you are doing these things.
Let's look at a couple of training strategies.
Before the training, check your gradients, check the loss function, check your own layer
implementations that they compute results correctly.
If you implemented your own layer, then compare the analytic and the numeric gradient.
You can use central differences for the numeric gradient.
You can use relative errors instead of absolute differences and consider numerics.
Use double precision for checking, temporarily scale the loss function and if you observe
very small values, choose your age for the step size appropriately.
Then we have a couple of additional recommendations.
If you only use few data points, then you will have fewer issues with non-differentiable
parts of the loss function.
You can train the network for a short period of time and then perform the gradient checks.
You can check the gradient first, then with regularization terms.
So you first turn the regularization terms off, check the gradient and in the end with
regularization terms.
Also turn off data augmentation and dropout.
So typically you make this check on rather small data sets.
The goal of initialization is that you have a correct random initialization of the layers.
So you can compute the loss for each class on the untrained network with regularization
turned off and of course that should give you a random classification.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:10:54 Min
Aufnahmedatum
2020-10-12
Hochgeladen am
2020-10-12 15:06:31
Sprache
en-US
Deep Learning - Common Practices Part 1
This video discusses the use of validation data and how to choose optimizers, monitor weights, and set learning rates including their annealing.
For reminders to watch the new video follow on Twitter or LinkedIn.
Further Reading:
A gentle Introduction to Deep Learning